Convolutional Neural Networks as Feature Generators for Near-duplicate Video Detection
نویسنده
چکیده
With ever increasing online repositories of video content, accurate and fast near-duplicate detection is important; it can help deter piracy and distribution of illegal content, while improving user experience when searching existing video collections by de-duplicating results and providing contextual data, such as links to similar videos or helping to extract meta data by clustering content. Existing methods are a trade-o↵ between frame-level accuracy, using complex spatial annotations as di↵erences of gaussians, and computational eciency, using video-level annotation of color and motion. With the increasing power and feasibility of deep neural networks, they provide a potential middle-ground, by providing a per-frame signature in terms of the dominant classification while still being relatively performant. In this implementation, pre-trained networks from the Ca↵e project, including AlexNet, GoogleNet and the R-CNN networks are used to construct a searchable database of video signatures, that can be queried by looking at the intersection of dominant features across frames. Using an evaluation dataset of approximately 250 videos, the R-CNN network is shown to produce a set of features that is resilient to common distortions, such as small rotations, cropping, re-encoding and changes to color and brightness. ROC curves show the network has a high degree of accuracy at matching known videos and distorted videos, while rejecting novel content. These results suggest that neural network-based near-duplicate detection is both feasible and accurate. Yet the current implementation is constrained by inherent limitations in the pre-trained networks, which were trained on a limited number of labels and are sensitive to strong color or rotational changes. Future work will look at training a novel neural network, as well as adapting the feature database to do similar-video search and meta-data extraction.
منابع مشابه
An efficient method for cloud detection based on the feature-level fusion of Landsat-8 OLI spectral bands in deep convolutional neural network
Cloud segmentation is a critical pre-processing step for any multi-spectral satellite image application. In particular, disaster-related applications e.g., flood monitoring or rapid damage mapping, which are highly time and data-critical, require methods that produce accurate cloud masks in a short time while being able to adapt to large variations in the target domain (induced by atmospheric c...
متن کاملHand Gesture Recognition from RGB-D Data using 2D and 3D Convolutional Neural Networks: a comparative study
Despite considerable enhances in recognizing hand gestures from still images, there are still many challenges in the classification of hand gestures in videos. The latter comes with more challenges, including higher computational complexity and arduous task of representing temporal features. Hand movement dynamics, represented by temporal features, have to be extracted by analyzing the total fr...
متن کاملIntroducing a method for extracting features from facial images based on applying transformations to features obtained from convolutional neural networks
In pattern recognition, features are denoting some measurable characteristics of an observed phenomenon and feature extraction is the procedure of measuring these characteristics. A set of features can be expressed by a feature vector which is used as the input data of a system. An efficient feature extraction method can improve the performance of a machine learning system such as face recognit...
متن کاملReceptive Field Encoding Model for Dynamic Natural Vision
Introduction: Encoding models are used to predict human brain activity in response to sensory stimuli. The purpose of these models is to explain how sensory information represent in the brain. Convolutional neural networks trained by images are capable of encoding magnetic resonance imaging data of humans viewing natural images. Considering the hemodynamic response function, these networks are ...
متن کاملRecognition of Visual Events using Spatio-Temporal Information of the Video Signal
Recognition of visual events as a video analysis task has become popular in machine learning community. While the traditional approaches for detection of video events have been used for a long time, the recently evolved deep learning based methods have revolutionized this area. They have enabled event recognition systems to achieve detection rates which were not reachable by traditional approac...
متن کامل